Annotating Syllable Corpora with Linguistic Data Categories in XML
نویسندگان
چکیده
The usefulness of high quality annotated corpora as a development aid in computational linguistic applications is now well understood. Therefore it is necessary to have systematic, easily understandable and effective means for annotating corpora at many levels of linguistic description using. This paper presents a three step methodology for annotating speech corpora using linguistic data categories in XML and provides a concrete example of how such an annotated corpus can be exploited and further enhanced by a syllable recognition system.
منابع مشابه
Tools for hierarchical annotation of typed dialogue
We discuss a set of tools for annotating a complex hierarchical and linguistic structure of tutorial dialogue based on the NITE XML Toolkit (NXT) (Carletta et al., 2003). The NXT API supports multi-layered stand-off data annotation and synchronisation with timed and speech data. Using NXT, we built a set of extensible tools for detailed structure annotation of typed tutorial dialogue, collected...
متن کاملAnnotating Corpora from Various Sources in the Humanities Domain
Voula Giouli Annotating corpora from various sources in the humanities domain: shortcomings and issues In this paper, we present work aimed at the linguistic annotation of Greek corpora that belong to the humanities domain, the focus being on the methodological principles as well as the implementation framework adopted. This framework builds on an existin...
متن کاملStep by step: underspecified markup in incremental rhetorical analysis
While quite a few linguistic corpora with syntactic annotations are available today, resources are scarce on the level of discourse annotation. A flexible, extendible annotation format speeds up development. We therefore propose an XML format for annotating rhetorical structure trees. In human and automatic analysis, rhetorical structure is often difficult and assigned incrementally. Thus, the ...
متن کاملAn Integrated Tool for Annotating Historical Corpora
E-Dictor is a tool for encoding, applying levels of editions, and assigning part-ofspeech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups. Preliminary results show a decrease of at leas...
متن کاملA Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles
Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004